Q-gram Analysis and Urn Models

نویسنده

  • Pierre Nicodème
چکیده

Words of fixed size q are commonly referred to as q-grams. We consider the problem of q-gram filtration, a method commonly used to speed up sequence comparison. We are interested in the statistics of the number of q-grams common to two random texts (where multiplicities are not counted) in the non uniform Bernoulli model. In the exact and dependent model, when omitting border effects, a q-gram in a random sequence depends on the q 1 preceding q-grams. In an approximate and independent model, we draw randomly a q-gram at each position, independently of the others positions. Using ball and urn models, we analyze the independent model. Numerical simulations show that this model is an excellent first order approximation to the dependent model. We provide an algorithm to compute the moments.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

P´olya Urn Models and Connections to Random Trees: A Review

This paper reviews P´olya urn models and their connection to random trees. Basic results are presented, together with proofs that underly the historical evolution of the accompanying thought process. Extensions and generalizations are given according to chronology: • P´olya-Eggenberger’s urn • Bernard Friedman’s urn • Generalized P´olya urns • Extended urn schemes • Invertible urn schemes ...

متن کامل

Branches in random recursive k-ary trees

In this paper, using generalized {polya} urn models we find the expected value of the size of a branch in recursive $k$-ary trees. We also find the expectation of the number of nodes of a given outdegree in a branch of such trees.

متن کامل

P´olya-Type Urn Models with Multiple Drawings

We investigate the distribution, mean value, variance and some limiting properties of an urn model of white and red balls under random multiple drawing (either with or without replacement) when the number of white and red balls added follows a schedule that depends on the number of white balls chosen in each drawing.

متن کامل

Asymptotic normality of urn models for clinical trials with delayed response

Response-adaptive design involves the sequential selection of design points chosen depending on the outcomes at previously selected design points. The response-adaptive design has been extensively studied in the literature; see Rosenberger (1996), Flournoy and Rosenberger (1995) and Hu and Ivanova (2004) for details. An important family of adaptive designs can be developed from the generalized ...

متن کامل

On death processes and urn models

In this work we are concerned with so-called Pólya-Eggenberger urn models, which in the simplest case of two colors can be described as follows. At the beginning, the urn contains n white and m black balls. At every step, we choose a ball at random from the urn, examine its color and put it back into the urn and then add/remove balls according to its color by the following rules: if the ball is...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003